95 research outputs found
Generation and Applications of Knowledge Graphs in Systems and Networks Biology
The acceleration in the generation of data in the biomedical domain has necessitated the use of computational approaches to assist in its interpretation. However, these approaches rely on the availability of high quality, structured, formalized biomedical knowledge. This thesis has the two goals to improve methods for curation and semantic data integration to generate high granularity biological knowledge graphs and to develop novel methods for using prior biological knowledge to propose new biological hypotheses. The first two publications describe an ecosystem for handling biological knowledge graphs encoded in the Biological Expression Language throughout the stages of curation, visualization, and analysis. Further, the second two publications describe the reproducible acquisition and integration of high-granularity knowledge with low contextual specificity from structured biological data sources on a massive scale and support the semi-automated curation of new content at high speed and precision. After building the ecosystem and acquiring content, the last three publications in this thesis demonstrate three different applications of biological knowledge graphs in modeling and simulation. The first demonstrates the use of agent-based modeling for simulation of neurodegenerative disease biomarker trajectories using biological knowledge graphs as priors. The second applies network representation learning to prioritize nodes in biological knowledge graphs based on corresponding experimental measurements to identify novel targets. Finally, the third uses biological knowledge graphs and develops algorithmics to deconvolute the mechanism of action of drugs, that could also serve to identify drug repositioning candidates. Ultimately, the this thesis lays the groundwork for production-level applications of drug repositioning algorithms and other knowledge-driven approaches to analyzing biomedical experiments
Wavelet-Packet Powered Deepfake Image Detection
As neural networks become more able to generate realistic artificial images,
they have the potential to improve movies, music, video games and make the
internet an even more creative and inspiring place. Yet, at the same time, the
latest technology potentially enables new digital ways to lie. In response, the
need for a diverse and reliable toolbox arises to identify artificial images
and other content. Previous work primarily relies on pixel-space CNN or the
Fourier transform. To the best of our knowledge, wavelet-based gan analysis and
detection methods have been absent thus far. This paper aims to fill this gap
and describes a wavelet-based approach to gan-generated image analysis and
detection. We evaluate our method on FFHQ, CelebA, and LSUN source
identification problems and find improved or competitive performance.Comment: Source code is available at
https://github.com/gan-police/frequency-forensic
PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings
Recently, knowledge graph embeddings (KGEs) received significant attention,
and several software libraries have been developed for training and evaluating
KGEs. While each of them addresses specific needs, we re-designed and
re-implemented PyKEEN, one of the first KGE libraries, in a community effort.
PyKEEN 1.0 enables users to compose knowledge graph embedding models (KGEMs)
based on a wide range of interaction models, training approaches, loss
functions, and permits the explicit modeling of inverse relations. Besides, an
automatic memory optimization has been realized in order to exploit the
provided hardware optimally, and through the integration of Optuna extensive
hyper-parameter optimization (HPO) functionalities are provided
Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework
The heterogeneity in recently published knowledge graph embedding models'
implementations, training, and evaluation has made fair and thorough
comparisons difficult. In order to assess the reproducibility of previously
published results, we re-implemented and evaluated 21 interaction models in the
PyKEEN software package. Here, we outline which results could be reproduced
with their reported hyper-parameters, which could only be reproduced with
alternate hyper-parameters, and which could not be reproduced at all as well as
provide insight as to why this might be the case.
We then performed a large-scale benchmarking on four datasets with several
thousands of experiments and 24,804 GPU hours of computation time. We present
insights gained as to best practices, best configurations for each model, and
where improvements could be made over previously published best configurations.
Our results highlight that the combination of model architecture, training
approach, loss function, and the explicit modeling of inverse relations is
crucial for a model's performances, and not only determined by the model
architecture. We provide evidence that several architectures can obtain results
competitive to the state-of-the-art when configured carefully. We have made all
code, experimental configurations, results, and analyses that lead to our
interpretations available at https://github.com/pykeen/pykeen and
https://github.com/pykeen/benchmarkin
The Human Phenotype Ontology in 2024: phenotypes around the world.
The Human Phenotype Ontology (HPO) is a widely used resource that comprehensively organizes and defines the phenotypic features of human disease, enabling computational inference and supporting genomic and phenotypic analyses through semantic similarity and machine learning algorithms. The HPO has widespread applications in clinical diagnostics and translational research, including genomic diagnostics, gene-disease discovery, and cohort analytics. In recent years, groups around the world have developed translations of the HPO from English to other languages, and the HPO browser has been internationalized, allowing users to view HPO term labels and in many cases synonyms and definitions in ten languages in addition to English. Since our last report, a total of 2239 new HPO terms and 49235 new HPO annotations were developed, many in collaboration with external groups in the fields of psychiatry, arthrogryposis, immunology and cardiology. The Medical Action Ontology (MAxO) is a new effort to model treatments and other measures taken for clinical management. Finally, the HPO consortium is contributing to efforts to integrate the HPO and the GA4GH Phenopacket Schema into electronic health records (EHRs) with the goal of more standardized and computable integration of rare disease data in EHRs
biopragmatics/semra: v0.0.4
️ Semantic Mapping Reasoning Assembler (SeMRA): tooling for semantic mapping
Easy ORCID
<p>The first-party ORCID data dump uses a complex data that is overly complex for most use cases. This record contains a derived version that is much more straightforwards, accessible, and smaller. It also includes a pre-build Gilda index for named entity recognition (NER) and named entity normalization (NEN).</p><p>It is automatically generated with code in https://github.com/cthoyt/orcid_downloader.</p>
Open Data, Open Code, Open Infrastructure Schematic Diagram
<p>A schematic diagram of how social workflows, technical workflows, and project governance interact with the open data, open code, and open infrastructure (O3)</p>
biopragmatics/bioontologies: v0.4.1
What's Changed
Improve availability check by @cthoyt in https://github.com/biopragmatics/bioontologies/pull/15
Full Changelog: https://github.com/biopragmatics/bioontologies/compare/v0.4.0...v0.4.
- …